Search CORE

46 research outputs found

Cross-lingual and cross-domain discourse segmentation of entire documents

Author: Braud Chloé
Lacroix Ophélie
Søgaard Anders
Publication venue
Publication date: 01/01/2017
Field of study

Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold standard sentence and token segmentation, and relying on high-quality syntactic parses and rich heuristics that are not generally available across languages and domains. In this paper, we propose statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations. We also consider the problem of learning discourse segmenters when no labeled data is available for a language. Our fully supervised system obtains 89.5% F1 for English newswire, with slight drops in performance on other domains, and we report supervised and unsupervised (cross-lingual) results for five languages in total.Comment: To appear in Proceedings of ACL 201

arXiv.org e-Print Archive

Copenhagen University Research Information System

Does syntax help discourse segmentation? Not so much

Author: Braud Chloé
Lacroix Ophélie
Søgaard Anders
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

International audienceDiscourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages , for two reasons: (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced with universal dependencies , or left out completely, as well as how state-of-the-art segmenters fare in the absence of sentence boundaries. Our results show that dependency information is less useful than expected, but we provide a fully scalable, robust model that only relies on part-of-speech information, and show that it performs well across languages in the absence of any gold-standard annotation

Crossref

Copenhagen University Research Information System

Label Pre-annotation for Building Non-projective Dependency Treebanks for French

Author: Boudin Florian
Béchet Denis
Lacroix Ophélie
Publication venue: HAL CCSD
Publication date: 06/04/2014
Field of study

posterInternational audienceThe current interest in accurate dependency parsing make it necessary to build dependency treebanks for French containing both projective and non-projective dependencies. In order to alleviate the work of the annotator, we propose to automatically pre-annotate the sentences with the labels of the dependencies ending on the words. The selection of the dependency labels reduces the ambiguity of the parsing. We show that a maximum entropy Markov model method reaches the label accuracy score of a standard dependency parser (MaltParser). Moreover, this method allows to find more than one label per word, i.e. the more probable ones, in order to improve the recall score. It improves the quality of the parsing step of the annotation process. Therefore, the inclusion of the method in the process of annotation makes the work quicker and more natural to annotators

Does syntax help discourse segmentation? Not so much

Author: Braud Chloé
Lacroix Ophélie
Søgaard Anders
Publication venue: HAL CCSD
Publication date: 01/09/2017
Field of study

Faculty of Organization and Informatics - Digital Repository

Croatian Digital Thesis Repository

University of Zagreb Repository

Noisy Channel for Low Resource Grammatical Error Correction

Author: Flachs Simon
Lacroix Ophélie
Søgaard Anders
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Crossref

Copenhagen University Research Information System

A Simple and Robust Approach to Detecting Subject-Verb Agreement Errors

Author: Flachs Simon
Lacroix Ophélie
Rei Marek
Søgaard Anders
Yannakoudakis Helen
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

While rule-based detection of subject-verb agreement (SVA) errors is sensitive to syntactic parsing errors and irregularities and exceptions to the main rules, neural sequential labelers have a tendency to overfit their training data. We observe that rule-based error generation is less sensitive to syntactic parsing errors and irregularities than error detection and explore a simple, yet efficient approach to getting the best of both worlds: We train neural sequential labelers on the combination of large volumes of silver standard data, obtained through rule-based error generation, and gold standard data. We show that our simple protocol leads to more robust detection of SVA errors on both in-domain and out-of-domain data, as well as in the context of other errors and long-distance dependencies; and across four standard benchmarks, the induced model on average achieves a new state of the art

Crossref

Copenhagen University Research Information System

Spiral - Imperial College Digital Repository

Resources and Evaluations for Danish Entity Resolution

Author: Barrett Maria Jung
Lacroix Ophélie
Lam Hieu
Plank Barbara
Søgaard Anders
Wu Martin
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

De l’étiquetage syntaxique pour les grammaires catégorielles de dépendances à l’analyse par transition dans le domaine de l’analyse en dépendances non-projective

Author: Lacroix Ophélie
Publication venue: HAL CCSD
Publication date: 08/12/2014
Field of study

This thesis takes place in the domain of syntactic dependency parsing. On the one hand we study the effect of a statistical method for syntactic tagging on a CDG-based parser (Categorial Dependency Grammar). We propose a pre-annotation process which includes the word-segmentation of sentences, the POS-tagging and the syntactic tagging of those words and the dependency analysis in order to alleviate the burden of the annotators in the context of the building of non-projective dependency treebanks for French.On the other hand, we study a data-driven method for dependency parsing through the adaptation of a transition-based parser to the dependency representation induced by the categorial dependency grammars. Moreover, we propose a three-steps transition-based method which performs separatly the prediction of the projective dependencies first and then the right and left non-projective dependencies in order to increase the prediction scores on non-projective dependencies. We show this method can be adapted to any standard dependency treebank.Cette thèse prend place dans le domaine de l’analyse syntaxique en dépendances. D’une part nous étudions l’impact d’une méthode statistique d’étiquetage syntaxique sur un analyseur basé sur les grammaires catégorielles de dépendances. Nous proposons en ce sens un processus complet de pré-annotation comprenant la segmentation des phrases en mots (incluant les mots composés), l’étiquetage grammatical et syntaxique de ces mots et l’analyse en dépendances de la phrase dans le but d’alléger le travail des annotateurs dans le cadre de la construction de corpus en dépendances non-projectifs pour le français. D’autre part, nous étudions également les méthodes intégralement dirigées par les données dans le domaine de l’analyse en dépendances à travers l’adaptation d’un analyseur par transition à la représentation en dépendances des grammaires catégorielles de dépendances. Puis nous proposons une méthode séparant les étapes de prédiction des dépendances projectives et non-projectives dans le but d’améliorer la prédiction des dépendances non-projectives. Nous montrons que cette méthode est adaptable à n’importe quel corpus en dépendances standard

Thèses en Ligne

Description d'une population de patients adressés à la consultation de pneumologie du centre hospitalier général de Périgueux pour recherche de troubles respiratoires du sommeil (à propos de 176 cas entre novembre et octobre 2007)

Author: BERGER Ophélie
LACROIX Serge
Publication venue
Publication date: 01/01/2008
Field of study

BORDEAUX2-BU Santé (330632101) / SudocPARIS-BIUM (751062103) / SudocSudocFranceF

OpenGrey Repository

Validation Issues induced by an Automatic Pre-Annotation Mechanism in the Building of Non-projective Dependency Treebanks

Author: Béchet Denis
Lacroix Ophélie
Publication venue: European Language Resources Association (ELRA)
Publication date: 26/05/2014
Field of study

International audienceIn order to build large dependency treebanks using the CDG Lab, a grammar-based dependency treebank development tool, an annotator usually has to fill a selection form before parsing. This step is usually necessary because, otherwise, the search space is too big for long sentences and the parser fails to produce at least one solution. With the information given by the annotator on the selection form the parser can produce one or several dependency structures and the annotator can proceed by adding positive or negative annotations on dependencies and launching iteratively the parser until the right dependency structure has been found. However, the selection form is sometimes difficult and long to fill because the annotator must have an idea of the result before parsing. The CDG Lab proposes to replace this form by an automatic pre-annotation mechanism. However, this model introduces some issues during the annotation phase that do not exist when the annotator uses a selection form. The article presents those issues and proposes some modifications of the CDG Lab in order to use effectively the automatic pre-annotation mechanism